Beyond the Listening Test: An Interactive Approach to TTS Evaluation
نویسندگان
چکیده
Traditionally, subjective text-to-speech (TTS) evaluation is performed through audio-only listening tests, where participants evaluate unrelated, context-free utterances. The ecological validity of these tests is questionable, as they do not represent real-world end-use scenarios. In this paper, we examine a novel approach to TTS evaluation in an imagined end-use, via a complex interaction with an avatar. 6 different voice conditions were tested: Natural speech, Unit Selection and Parametric Synthesis, in neutral and expressive realizations. Results were compared to a traditional audio-only evaluation baseline. Participants in both studies rated the voices for naturalness and expressivity. The baseline study showed canonical results for naturalness: Natural speech scored highest, followed by Unit Selection, then Parametric synthesis. Expressivity was clearly distinguishable in all conditions. In the avatar interaction study, participants rated naturalness in the same order as the baseline, though with smaller effect size; expressivity was not distinguishable. Further, no significant correlations were found between cognitive or affective responses and any voice conditions. This highlights 2 primary challenges in designing more valid TTS evaluations: in real-world use-cases involving interaction, listeners generally interact with a single voice, making comparative analysis unfeasible, and in complex interactions, the context and content may confound perception of voice quality.
منابع مشابه
Power Spectral Densit Equalization of Large Sp Concatenative T
This paper proposes a channel equalization algorithm for a large speech database with application in concatenative TTS systems. The convolutional channel distortion is equalized by comparing the power spectral densities (PSDs) of utterances of different recording sessions. Autoregressive linear filters are designed on a corpus level and are used offline to filter the corresponding sentences to ...
متن کاملSynthesizing and evaluating an artificial language: klingon
The synthesis of an artificial language can provide some interesting extensions for the evaluation of text-to-speech (TTS) systems. For the alternative evaluation of the TTS system DRESS a new module for the artificial language Klingon has been developed. The linguistic and phonetic structure of Klingon can be modeled mainly by rules, with less exceptions. This contribution introduces the multi...
متن کاملThe Design of Czech Language Formal Listening Tests for the Evaluation of TTS Systems
This paper presents an attempt to design listening tests for the Czech synthesis speech evaluation. The design is based on standardized and widely used listening tests for English; therefore, we can benefit from the advantages provided by standards. Bearing the Czech language phenomena in mind, we filled the standard frameworks of several listening tests, especially the MRT (Modified Rhyme Test...
متن کاملRobust Methodology for TTS Enhancement Evaluation
The paper points to problematic and usually neglected aspects of using listening tests for TTS evaluation. It shows that simple random selection of phrases to be listened to may not cover those cases which are relevant to the evaluated TTS system. Also, it shows that a reliable phrase set cannot be chosen without a deeper knowledge of the distribution of differences in synthetic speech, which a...
متن کاملProsodic Phrases and Semantic Accents in Speech Corpus for Czech TTS Synthesis
We describe a statistical method for assignment of prosodic phrases and semantic accents in read speech data. The method is based on statistical evaluation of listening test data by a maximum-likelihood approach with parameters estimated by an EM algorithm. We also present linguistically relevant quantitative results about the prosodic phrase and semantic accent distribution in 250 Czech
متن کامل